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ABSTRACT 

A study concerned with identifying sources of 
interrater variation in ratings posed the following questions: Are 
ratings decomposable into a single nonerror component with interrater 
variations representing individual error components, or is a better 
fit to the data provided by multiple nonerror components representing 
generalized rating styles? And if multiple rating styles are found, 
what are their characteristics? Rated events were 10-minute segments 
from videotapes of high school classes in four different subjects. 

The 50-minute composite videotape was viewed by 83 subjects 
(teachers, teacher trainees, school administrators, and graduate 
students) using a 21-item questionnaire synthesized from a variety of 
sources to sample three aspects of teaching behavior: intended 
objectives, teaching style, and interpersonal climate. The data from 
ratings of the four classrooms with the 21 scales formed an 83 x 21 x 
4 data array. Two analyses were performed on the extended matrix: 
principal component analysis of covariances and correlations between 
rows. Additional analytical procedures were employed to characterize 
generalized rating styles. Conclusions are methodological rather than 
substantive: The analytical procedures offer the possibility of 
providing more information about the quality of ratings than is 
provided by more traditional reliability estimation procedures, and 
provide a basis for selecting raters having rating styles of 
particular interest. (Observation schedule and data tables included.) 
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INTRODUCTION 



Despite critical coramentaiy about the quality of infoimation pro- 
Tided “bif ratings, they continue to be a popular sourc- of data about 
classroom bdiavior— either as criteria of teacher effectiveness, or as 
indices of operative variables in the classroom situation. Ratings 
will undoubtedly continue to be widely used because they are ea^ and 
ineapensive to use and because th^ often provide abstractive infor- 
mation not readily available any other way • Claims that ratings are 
unreliable (Biddle, 196?) and that they may not measure what they are 
intended to measure (Guilford, 1962) suggest scrutiny of several a^ects 
of rating methods, especially in instructional research. This p^er ^ 
deals with the specific question of identifying sources of interrater 
variation in ratings. 

Before proceeding to a description of the problem and procedures 
for investigating it, a brief account of the genesis of the problem is 
in order. The starting point for the account is the assertion that 
ratings are unreliable. The statement is a troublesome one: the term 

"reliability” is used airbiguouslyj and the assertion is, in large part, 
undocumented. Strictly speaking, a necessaiy condition for estimating 
reliability of ratings is that a set of raters rate a common set of 
events. Estimation of reliability of ratings as stability would require 
that a set of raters make Treated observations of the same set of events; 
but such conditions are rarely available. 
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Tsro en5)irical approaches predominate in estimating reliability as 
equivalence. The first, restricted to multi-item rating devices, is 
internal consistency estimation^ the preferred procedure probably 
being intraclass correlation or other ana3ysis-of-variance-based 
procedure* The second, usually but not necessarily restricted to rating 
devices producing a single score, involves treating multiple raters as 
analogous to equivalent test forms. In both approaches, decomposition 
of ratings into independent couponents is on the basis of the classical 
test theory model 

*ers = ■^ers 

LaForge (1965) has pointed out that in the multiple rater situation, 
there may be more than one way to relate ratings to patterns of behavioral 
cues# The classical model essentially takes into account only the most 
popular viewj when, in fact, minority views may be ;)ust as relevant and 
just as free of error. 

LaForge ^s article suggested as an alternative that individual 
ratings might be deconposable into r independent nonerror conponoats, 
each one representing a different way of mapping patterns of cues into 
ratings~a different "rating style." The choice of a best-fitting 
decomposition model is enpirically testable. If the classical model 
provides the best fit, the principal components of a matrix of rater 
intercorrelations will be found to consist of one conponent with a 
large characteristic root and k - 1 coiponents with much s mall er, 
approximately equal characteristic roots. If multiple rating styles 
are represented in the data, rater intercorrelations will produce 
two or more comronents with large characteristic roots. Determination 
of the meaning of "large" will be dealt with later. 
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IaForge*s argumeiit is consistent with Remners* (1963) argument 
that ratings are the output of perceptual processes* Renmers* argu- 
ment may be extended by considering ratings as responses ^mctionaliy 
related to objective properties of observed events and to internal per- 
c^tual mechanisms of individual raters* Differences between raters in 
internal perc^tual mechanisms could be represented as differences in 
parameter values of functional relatianships between event-properties 
and perc^tual output* This argument suggests the relevance of Tucker *s 
work (1958, 1966) in the use of principal coo^nent analysis in the 
determination of parameteis of fmctional relationships* Since one of 
the parameters might well be associated with individual differences in 
the diversion of ratings^ either over scales or over evoat, principal 
conponent analysis of covariance matrices also represents an appropriate 
basis for identification of generalized rating styles* 

The present study can be considered as an extension of the laForge 
study* The basic question is the same: are ratings deconposable into y 

a single nonerror conponent with interrater variations representing 
individual error components or is a better fit to the data provided by 
multiple nonerror conponents represent: ag generalized rating styles? 

An additional question is posed: if multiple rating styles are found 

in a set of rating data, idiat are the characteristics of the multiple 
rating styles? This study differs from the LaForge study in three 
other respects: the rated events were videotaped segments of secondaiy 

school classes, the ratings themselves were vectors of scores on 
multiple scales rather than single score ratings, and some additional 
analytical procedures were enployed to characterize generalized 
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rating styles. 



PROCEDURE 



The rated events in the study were ten-minute segments from video- 
tapes of four classes recorded at University Hi^ School in Normal, 
I lli nois. Ten-minute segments ftom classes in World Histoiy, Chemistiy, 
G«ieral Mathematics, and American Histoiy were combined into a 
conposite allowing three-minute pauses between segments. The cojqxjsite 
videotape was viewed by 83 subjects— 21; teachers trainees, 22 classroom 
teachers, 21 school administrators, and 19 graduate students enrolled 
either in guidance or school psychology programs. 

The rating device was a 21-item questionnaire synthesized from a 
variety of sources to sanple three aspects of teaching behavior referred 
to ty Sorenson and Gross (l965)s intended objectives of Instruction, 
teaching style, and inteipersonal climate. Seven items were intended 
to convqr information about elements of a subject-matter Eiastery 
orientation 5 seven were related to interpersonal climate; and seven 
were intended to characterize teaching styles between the extremes of 
didactic teaching and discovery teaching. A copy is included in the 
Appendix. 

The data from ratings of the four classroom bdiavior samples with 
the 21 scales formed an 83 X 21 X 1; data array. Analysis proceeded on 
the extended two-way array of 83 row supervectors of foui? 21-element 
vectors (Horst, 1965. Pp. 317-32U*)* Two analyses were performed on 
this extended matrix: principal couponent analysis of covariances be- 
tween rows, and principal conponent analysis of correlations between 
rows. Analysis of the covariance matrix permitted more detailed analysis 
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of general iz ed rating styles* 3h additicxi^ the analysis of the co- 
variance matrix produced a reduced matrix of projections of scale- 
classroom coiribinations on the principal components. Unfolding analysis 
of order relations among these coefficients provided further infor- 
mation about characteristics of rating styles, 

RESULTS 

The characteristic roots of the covariance matrix are presented in 
Table 1 of the Appendix, along with increments between successive roots, 
variance accounted for ty the con|jonent associated with each root, and 
the cumulative variance associated with successive coiqponents. The 
same information obtained from analysis of the correlation matrix is 
presented in Table 2 of the Appendix. At this point, the question of 
hew many noneri^or conponents best characterize the data arises. 

LaForge cited two criteria for deciding how manyvconponents-to 
retain. The first criterion, a p^chometric one, indicates retaining 
all ccnqponents associated with characteristic roots with values greater 
than one. For the correlation matrix, this criterion would result in 
the retention of 19 components » For the covariance matrix this criterion 
is meanln^ess since the disperions of individual ratings are not stan- 
dardized, The second criterion involves a statistical test of differences 
in magnitudes of successive roots. The statistical criterion was not 
applicable for this particular correlation matrix because the value of 
the determinant, required in making the test, was approximately zero. 

The determinant of the covariance matrix was not obtained. 

Another criterion has been suggested by Gulliksen (1959)^ related 
to the asynptotic nature of a plot of the magnitude of characteristic 
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roots as a Hmction of their ordinal number, Applicaticsi of this 
criterion indicates retention of two coinponents of the correlation 
matrix and three of the covariance matrix. The difference in the 
number of factors between the covariance matrix and correlation matrix 
reflects the fact that interrater variations in dispersion of ratings 
are retained in the covariance matrix, but not in the correlation 
matrix. 

Loadings of individuals on the principal ccmponents of the 
correlation matrix are presented in Table 3 of the Appendix. These 
loadings represent correlations of ratings of individual raters with 
what may be interpreted as the true scores for generalized rating 
styles. The first three conponents of the covariance matrix accounted 
for approximately percent of total variance; the first two components 
of the correlation matrix accounted for ajproximately J|0 percent of 
total variance. The variance accounted for by the first conponent of 
the covariance matrix was approximately 29 percent as conpared to 
about 30 percent for the correlation matrix; hence, a substantially 
better fit is provided by the rq>resentation of multiple nonerror 
components. The large amount of random variati(xi remaining may be (hie 
to the fact that only four events were rated with the 21 scales, attenuating 
variance of individual scales over events. 

The coefficients of the 81; classroom-scale observation units for 
the three principal conponents were represented in three 21 X 1; tables. 

The three 21 X 1; tables are combined in Table 1; of the Appendix. Each 
row of each of the three tables generates a rank ordering of the four 
classroom segments on a single scale. The orderings can be interpreted 
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as representing an order of proximity to the ideal point of a scale for 
a rater utilizing each generalized rating style* This interpretation 
suggests the applicability of unfolding analysis (Cooiribs, 1961;) for 
r^resentation of the characteristics of the generalized rating styles. 
The existence of six rankings of a set of four objects (I-scales) un- 
foldable into a single rank order and its mirror image (a J-scale) pro- 
vide the basis for inference of a single attribute underlying the six 
rankings. The existence of more than one set of six unfoldable orders 
allows the inference of additional attributes* The orders of the four 
class 3 ?oom segments associated with the three conponents and the J- scales 
recovered frcM these orders are presented in Tables 5^ 6, and 7 in the 
Appendix. 

For the first conponent, rankings of the four classroom segments 
produced two J-scales. Tlie first J-scale> defined by the order BDAC 
gTvi its mirror image CADB suggests a contrast between care^l prepara- 
tion^ clear organization, and intergration of topics to mattentiveness 
of students, deficiency in scholarship, and fault-finding and unfriend- 
liness in the classroom. The second J— scale, defined by the order 
DCBA and its mirror image ABCD, is interpreted as a contrast between 
acc^tance of pupil *s ideas and permissiveness and teacher determin- 
ation of topics and teacher involvement with the whole class, in 

contrast to small groups of pupils. 

For the second and third conponents, ranking of the classroom 
segments was predominantly unidemensional. For the second component, 
the ordering attribute is r^resented by a J-scale defined by the 
order CDAB and its mirror image BADC. For the third component, the 
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ordering a'ttribu.te is represented by a J-scale defined by the order 
CADB and its mirror image BDAC, Althou^ noncollinear with the 
second J-scale recovered from the first conponent^ the J-scale recovered 
from the second coi^ponent was indistinguishable from it. The unhiding 
set recovered from the third conponent was inconplete but suggested a 
contrast between superior scholarship and teacher dominance of the 
classroom, 

DISCUSSION 

The conclusions to be reached from the investigation reported here 
are methodological rather than substantive. In tha data obtained, it is 
clear that individual ratings were deconposable into more than one non- 
error conponent, but no claim is made that these results would general- 
ize to another sairple of raters, another set of rating scales, or 
another set of events • The anaiytical procedures offer the. possibility 
of providing more infoxmation about the quality of ratings than is 
provided by more traditional reliability estimation procedures and pro- 
vide a basis for selecting raters having ratings styles of particular 
interest, as suggested by Anderson and Hunka (l96i;)* The inteipretations 
of the generalized rating styles are somewhat tentative because of the 
small number of events observed. Work is underway to conpare the results 
of this form of anaiysis to the results of reliability estimation based 
on analysis of variance of the events by scales by raters classification. 
In addition, production of additional videotapes is underway to provide 
a larger number of events leading to a more adequate characterization of 



er|c 



Individual rating styles. 
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APPMDIK 




CLASSROOM OBSERVATI(»f JUDGEMENT SCHEDULE 



Observer Class 



1. 


Teacher *s prq)aration for class 
meeting. jio evidence 


moderately 


veiy care- 




of preparation 


well prqjared 


fully pr^ared 


o 


Teacher’s ability to arx>use 

pupil’s interest. majority of 


pupils mildy 


pupil’s in- 




pi:pils i^ttentive 


interested 


tcx'coU TTory bi 


3. 


Teacher’s organization of 
instructional material* no sign of 


some organiz- 


organization 




system or order 


aticn apparent 


clearly apparent 



h» Topic emphasis^ balance : 

between fijndamentals and ne^ect Hmda- ftinda- stresses 

trivia* jnaitals for trivia mentals; half fundamentals; dis 

trivia regards trivia 

5* Scholarship; knowledge of 

subject matter. clearUy tes 3 ctbook clearly 

deficient coupetenqy superior 

6. Ability to eo^ress ideas. 

inarticulate; rather hesitant; fluoit; 

obscure sli^tly obscure clear 



7 . 

8 . 



9 . 



10 . 



11 . 



12 . 



Integration of lesson 



topics* 


lesson topics 
isolated 


some integration 
of lesson topics 


ail topics 
integrated 


Acc^tance of pupils* ideas 


rejects all 
pupil ideas 


acc^t ideas 
having merit 


~ccq>ts all 
pipils’ ideas 


AccQ)tance of pupils’ 
behavior. 


highly 

critical 


critical of 
extreme devlancy 


highly 

permissive 


Attitude toward pipils. 


unsympathetic ; 
inconsiderate 


generally some- 
what considerate 


courteous 
and considerat: 


Social distance from 
pupils. 


faultfinding; 

unfriendly 


serious; some- 
what reserved 


conversa- 
tional; friendly 


Formality of classroom 









procedures. rigidly formal rather informal; informal 

strucbired somewhat stmctured unstructured 



PLEASE TURN PAGE 



f. \ 

I i 



pj 

ft 

pi 



13 • Manifest anxiety in 
classroom. 



Hi. Discipline and order in 
classroom. 



15 • Verbal output initiated "ay 
teacher. 



highly tenser 
anxious 



order strictly 
maintained 






16. Relative mformation contribution 

of teacher. w ~ 

17. Size of classroom group(s) 

with which teacher is involved. 1 or 2 

pupils 

18 « Degree of teacher involvement 

with group(s). nviTTiTn^l 

involvement 



19. Determination of topics to 

be considered. determined by 

class interests 



20. Task focus. 



21. Inductive-deductive focus 
of class. 



focus on critical 
analysis of 
sources of facts 



generally relaxed^ no sign of 
some taision anxiely 



some disorder but 
no nonsense 



5^ 



w 



half of 
class 



involvement 
limited to 
guidance 



pupils self- 
regulating 



w 



90^ 



nearly all 
of class 



"active’ partici- 
pation in aH 
groups 



topic sequence 
from facts to 
generalization 



teacher determin- total teacher 
ation modified determination 
by class interests 



some critical focus on 
analysis of sources factual 
of factual content content 



facts and 
generalizations 
in no sequence 



topic sequence 
fi*om generalisa- 
tion to i^ecific 
facts 



ierIc 
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TABLE 1 

Characteristic Roots of Covariaiice Matrix 



k 


Boot X 10"*^ 


Increment 
(■•k + 1 ”‘k) 


Percent of 
Variance 


ChuailatiTe 
Percent of Variance 


1 


9.891 


-- 


29.91 


29.91 


2 


3.555 


6,336 


10.75 


U0.66 


3 


1.512 


2 . 0 U 3 


li.57 


1;5.23 


h 


1.367 




h . l 3 


1*9.36 


5 


1.252 


.115 


3.79 


53.15 


6 


1.135 


.117 


3.1;3 


56.58 


7 


.92lj 


.211 


2.80 


59.38 


8 


.801 


.123 


2.1;2 


61.80 


.9 


.783 


.018 


2.37 


6!*.17 


10 


.709 


.071; 


2.11; 


66.31 


11 


.671* 


.035 


2.01; 


68.35 


12 


.599 


.075 


1.81 


70.16 


13 


.582 


.017 


1.76 


71.92 


111 


.55U 


.028 


1.68 


73.60 


15 


.U96 


.058 


1.50 


75.10 


16 


.U52 


.0l4li 


1.37 


76.1*7 


17 


.WiO 


.012 


1.33 


77.80 


18 


.1*37 


.003 


1.32 


79.12 


19 


.392 


.0l;5 


1.19 


80.31 


20 


.38U 


.008 


1.16 


81.1*7 


21 


.357 


.027 


1.08 


82.55 
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TABLE 2 



k 


Boot 


Jncronent 
(*k + 1 ->k) 


Percent of 
Variance 


CuimilatiTe 
Percent of 


1 


23.732 




28.59 


28.59 


2 


9.1;03 


11;.329 


11.1;7 


39.92 


3 


3.768 


5.635 


1;.51; 


lih‘h6 


h 


3.2l;5 


.523 


3.91 


1i8.37 


5 


3.017 


.228 


3.61* 


52.01 


6 


2.828 


.189 


3.1;0 


5S.1H 


7 


2.297 


.531 


2.77 


58.18 


8 


2.065 


.232 


2.1;9 


60.67 


9 


1.988 


.077 


2.11 


63.06 


10 


1.8U8 


.11;0 


2.23 


t6.Z9 


11 


1.572 


.276 


1.91 


67.18 


12 


1.560 


.012 


1.88 


69.06 


13 


1.1;58 


.102 


1.76 


'IQ.yi 


1U 


1.U03 


.055 


1.69 


72.51 


15 


1.229 


.171; 


1.1;8 


73.99 


16 


1.192 


.037 


1.14; 


75.1*3 


17 


1.128 


.061; 


1 .36 


76.79 


18 


1.073 


.055 


1.29 


78.08 


19 


1.031; 


.039 


1.21; 


79.32 


20 


.979 


.055 


1.18 


80.50 


21 


.9l;5 


.031; 


1.11; 


81 .6!i. 




TjffiLS 3 

Factor Loadings of Raters on Principal 
Conponents of Correlati-Mi Matrix 
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Rater 


I 


II 


Rater 


I 


II 


1 


.756 


-.209 


1*3 


.656 


-.215 


2 


.770 


-.239 


1*1* 


.^6 


-.277 


3 


.650 


-.298 


1*5 


.670 


-.235 


h 


.669 


-.271 


1*6 


.681 


-.099 


5 


,6ta 


-.383 


1*7 


.397 


-.382 


6 


.h99 


-.388 


1*8 


.761* 


-.21*9 


7 


.109 


. 023 . 


1*9 


.736 


->076 


8 


.310 


-.390 


50 


.61*5 


-.11*6 


9 


.701* 


-.208 


51 


.658 


-.329 


10 


.670 


-.211 


52 


.623 


-.129 


11 


.535 


-.368 


53 


.733 


-.033 


12 


.770 


-.251 


5U 


.368 


-.061* 


13 


•6Uj 


-.181 


55 


.670 


-.183 


lii 


.739 


-.211 


56 


.655 


-.116 


15 


.W2 


•r.077 


57 


.1*97 


-.076 


16 


.526 


-.038 


58 


.623 


-.331* 


17 


.222 


.1^68 


59 


.617 


-.296 


18 


.699 


-.168 


60 


.515 


-.216 


19 


.522 


-.221 


61 


.718 


-.271* 


20 


.357 


.21*8 


62 


.762 


-.296 


21 


.37S 


.319 


63 


.767 


-.266 


22 


.560 


•iil*7 


• 61* 


.1*61 


-.273 


23 


.588 


.210 


65 


.325 


>1*39 


2h 


.1*82 


.291 


66 


.596 


.269 


25 


.30^^ 


.51*5 


67 


.1*59 


.1*80 


26 


.357 


•3l*8 


68 


.210 


.381* 


27 


.268 


.361 


69 


.066 


.100 


28 


.Uoli 


.552 


70 


.262 


.512 


29 


.m 


.1*52 


71 


.210 


.1*63 


30 


.3hh 


.587 


72 


.718 


-.282 


31 


.386 


.361 


73 


. 2 la 


.555 


32 


.290 


.1*66 


71* 


.1*51* 


.507 


33 


.Wt 


.21*2 


75 


.367 


.ia6 


3k 


.517 


.306 


76 


.358 


.510 


35 


.1*78 


.232 


77 


.260 


.208 


36 


•ll93 


.165 


78 


.3ia 


.581* 


37 


.185 


.351; 


79 


.31*8 


.612 


38 


.U09 


.306 


80 


.583 


.285 


39 


.U67 


.1*35 


81 


.778 


-.319 


UO 


.336 


.31*9 


82 


.292 


.569 


la 


.138 


.217 


83 


.1*81* 


.267 


li2 


.587 


.1*20 
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TABLE ^ 

Observed Orders and J-Scale for First Principal Conqponent 



Orders 


Firequency 


J-Scale I 


J-Scale II 


ABCD 


1 






ADBC 


3 






ADCB 


1 


BDAC 


DCBA 


BADC 


2 


DBAC 


(CDBA) 


BDAC 


3 


DABC 


CBDA 


BDCA 


1 


ADBC DACB 


(BUnA) fCBAD) 


CADB 


1 


ADCB 


(BCAD) 


CBDA 


2 


(AGDB) 


(BACD) 


DABC 


2 


CADB 


ABCD 


DACB 


1 






DBAC 


1 






DBCA 


1 


The Orders in parentheses were 






not observed. 


DCBA 


1 







TABLE 6 



Observed Orders and J-Scale for Second Principal Component 



Order 

ABDC 

ACBD 

ADBC 

BADC 

CADB 

CDAB 

CDBA 

DABC 

DCAB 



Frequency 

1 

1 

1 

1 

k 

1 

$ 

1 



J-Scale 



CDAB 

DCAB 

(DACB) 

DABC 

ADBC 

ABDC 

BADC 



The order in parentheses was 
not observed. 
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TABLE 7 

Observed Orders and J— Scale for Thii\i Principal Ccanponent 



Older 


Frequency 


J-Scale 


ABGD 


1 




ACDB 


1 


CADB 


BMC 


1 


ACDB 


CABD 


h 


(ADCB) 


CADB 


2 


DACB 


CBAD 


3 


(ABDC) 


CBDA 


3 


(BADC) 


CDAB 


1 


BDAC 


CDBA 


3 




DACB 


2 





The orders in parentheses were 
not observed. 




